The data which was pulled from CPD data warehouse has a format that looks like:

##      DATEOCC YEAR MONTH DAY DOW CURR_IUCR FBI_CD AREA BEAT DISTRICT
## 1 2008-01-01 2008     1   1 Tue       610      5    2  623        6
## 2 2008-01-01 2008     1   1 Tue       610      5    5 1421       14
## 3 2008-01-01 2008     1   1 Tue       610      5    1  824        8
##   X_COORD Y_COORD LOCATION INC_CNT
## 1 1179897 1852178      210       1
## 2 1157184 1911751      210       1
## 3 1159285 1867501      290       1

A preview of the variables

## 'data.frame':    161738 obs. of  14 variables:
##  $ DATEOCC  : Date, format: "2008-01-01" "2008-01-01" ...
##  $ YEAR     : int  2008 2008 2008 2008 2008 2008 2008 2008 2008 2008 ...
##  $ MONTH    : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ DAY      : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ DOW      : Factor w/ 7 levels "Sun","Mon","Tue",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ CURR_IUCR: Factor w/ 4 levels "610","620","630",..: 1 1 1 1 2 1 1 1 2 1 ...
##  $ FBI_CD   : Factor w/ 1 level "5": 1 1 1 1 1 1 1 1 1 1 ...
##  $ AREA     : Factor w/ 5 levels "1","2","3","4",..: 2 5 1 5 1 4 5 2 1 2 ...
##  $ BEAT     : Factor w/ 301 levels "111","112","113",..: 68 174 97 178 112 168 293 35 96 265 ...
##  $ DISTRICT : Factor w/ 26 levels "1","2","3","4",..: 6 14 8 14 9 13 25 3 8 22 ...
##  $ X_COORD  : int  1179897 1157184 1159285 1158917 1170421 1159992 1150997 1187783 1158765 1172773 ...
##  $ Y_COORD  : int  1852178 1911751 1867501 1916149 1881971 1901896 1919848 1855603 1862371 1835967 ...
##  $ LOCATION : Factor w/ 83 levels "","090","092",..: 40 40 63 40 63 39 40 16 63 52 ...
##  $ INC_CNT  : int  1 1 1 1 1 1 1 1 1 1 ...

A summary of the data

summary(BurglaryData)
##     DATEOCC                YEAR          MONTH             DAY       
##  Min.   :2008-01-01   Min.   :2008   Min.   : 1.000   Min.   : 1.00  
##  1st Qu.:2009-07-30   1st Qu.:2009   1st Qu.: 4.000   1st Qu.: 8.00  
##  Median :2011-01-22   Median :2011   Median : 7.000   Median :16.00  
##  Mean   :2011-03-06   Mean   :2011   Mean   : 6.812   Mean   :15.89  
##  3rd Qu.:2012-08-31   3rd Qu.:2012   3rd Qu.:10.000   3rd Qu.:23.00  
##  Max.   :2014-12-31   Max.   :2014   Max.   :12.000   Max.   :31.00  
##                                                                      
##   DOW        CURR_IUCR    FBI_CD       AREA            BEAT       
##  Sun:17727   610:108315   5:161738   1   :46520   421    :  1840  
##  Mon:25003   620: 43787              2   :48406   423    :  1675  
##  Tue:24858   630:  7431              3   :31206   414    :  1505  
##  Wed:24617   650:  2205              4   :11883   835    :  1481  
##  Thu:24642                           5   :23719   831    :  1416  
##  Fri:25843                           NA's:    4   (Other):153818  
##  Sat:19048                                        NA's   :     3  
##     DISTRICT         X_COORD           Y_COORD           LOCATION    
##  8      : 14982   Min.   :1099259   Min.   :1813949   290    :58308  
##  4      : 12674   1st Qu.:1153260   1st Qu.:1855402   090    :53206  
##  25     : 11060   Median :1165290   Median :1873772   210    :23140  
##  7      : 10787   Mean   :1164957   Mean   :1882216   330    : 4863  
##  3      : 10504   3rd Qu.:1176635   3rd Qu.:1911560   261    : 2975  
##  (Other):101728   Max.   :1205079   Max.   :1951601   200    : 2442  
##  NA's   :     3                                       (Other):16804  
##     INC_CNT 
##  Min.   :1  
##  1st Qu.:1  
##  Median :1  
##  Mean   :1  
##  3rd Qu.:1  
##  Max.   :1  
## 

From the summary, we can already see burglaries have different weekly pattern as weekend has significantly less incidents than weekday. The majority of burglary type is 0610 which is Forcible Entry (0620: Unlawful Entry; 0630: Attempt Forcible Entry; 0650: Home Invasion). The top three location types are 290: Residence;090 Apartment;210 Residence Garage

The summary of how the crime counts are distributed in each area

## 
##     1     2     3     4     5  <NA> 
## 46520 48406 31206 11883 23719     4

and in each district

## 
##     1     2     3     4     5     6     7     8     9    10    11    12 
##  1389  4423 10504 12674  8391 10471 10787 14982  7914  5539  5756  3623 
##    13    14    15    16    17    18    19    20    21    22    23    24 
##  2953  8151  4659  5796  6033  3344  6410  2546  1771  6580  1393  4585 
##    25    31  <NA> 
## 11060     1     3

What needs to be noticed is District 31 only has 1 incidents during the 7 year period.

Most of the missing values (only in attribute AREA,DISTRICT and BEAT) have identical row indices.

From the shape files provided by CPD, the area, district and beat polygon map are shown below

## OGR data source with driver: ESRI Shapefile 
## Source: "/Users/xiaomuliu/CrimeProject/SpatioTemporalModeling/CPDShapeFiles/", layer: "area_bndy"
## with 8 features and 3 fields
## Feature type: wkbPolygon with 2 dimensions
## OGR data source with driver: ESRI Shapefile 
## Source: "/Users/xiaomuliu/CrimeProject/SpatioTemporalModeling/CPDShapeFiles/", layer: "district_bndy"
## with 28 features and 3 fields
## Feature type: wkbPolygon with 2 dimensions
## OGR data source with driver: ESRI Shapefile 
## Source: "/Users/xiaomuliu/CrimeProject/SpatioTemporalModeling/CPDShapeFiles/", layer: "beat_bndy"
## with 288 features and 3 fields
## Feature type: wkbPolygon with 2 dimensions

A scatter point plot of burglary locations for a certain day (2014-01-01)

Let’s first aggregate data by policing beat/district. Both of the plots below try to unveil if different districts have similar seasonal patterns or not.

The top plot shows the daily crime time series. Note that the series of district 13, 21, and 23 seem to be truncated. It turned out that data of distirct 13, 21 and 13 is only avaiable up to 2012/12/16, 2013/03/02, and 2013/03/01 respectively. For the bottom plot, the crime counts were first grouped in year and then aggregated by district and month.

Grouping by beat would present higher resolution view of spatial and temporal patterns. However, as we have nearly 300 beats, instead of using muit-panel plot, we resorted to heap map to show these patterns.

Again, some beats have strong decreaseing periodic seasonal trend while some others don’t. And the burglary counts in adjacent beats are usually close.

Now let’s shift from policing regional study to city-wide analysis. Here is an incident location plot for each month of year 2014.

It is difficult to examine if crime location clusters are time-varying just by looking at the point plots. Let’s move to grid(pixel)-based analysis. First, the point data was rasterized through binning into a 100 \(\times\) 100 grid (the boundaries were defined by the range of x-coordinate and y-coordinate from all available crime locations plus a margin of 1000 unit on each side). Here shows an example of pixelized violent crime locations in January 2014.

Next, we do kernel density estimation (KDE) of the monthly aggregation over each year. The kernel applied here is a 2D Gaussian kernel with the same bandwidth in each direction. The bandwidth was selected through (minimizing MSE) cross-valiation using all available data (08-14).

Here displays an animation of KDE for each year (08-14).

KDE animation